18 research outputs found
Building a Test Collection for Significant-Event Detection in Arabic Tweets
With the increasing popularity of microblogging services like Twitter, researchers discov-
ered a rich medium for tackling real-life problems like event detection. However, event
detection in Twitter is often obstructed by the lack of public evaluation mechanisms
such as test collections (set of tweets, labels, and queries to measure the eectiveness of
an information retrieval system). The problem is more evident when non-English lan-
guages, e.g., Arabic, are concerned. With the recent surge of signicant events in the
Arab world, news agencies and decision makers rely on Twitters microblogging service to
obtain recent information on events. In this thesis, we address the problem of building a
test collection of Arabic tweets (named EveTAR) for the task of event detection.
To build EveTAR, we rst adopted an adequate denition of an event, which is a
signicant occurrence that takes place at a certain time. An occurrence is signicant if
there are news articles about it. We collected Arabic tweets using Twitter's streaming
API. Then, we identied a set of events from the Arabic data collection using Wikipedias
current events portal. Corresponding tweets were extracted by querying the Arabic data
collection with a set of manually-constructed queries. To obtain relevance judgments for
those tweets, we leveraged CrowdFlower's crowdsourcing platform.
Over a period of 4 weeks, we crawled over 590M tweets, from which we identied 66
events that cover 8 dierent categories and gathered more than 134k relevance judgments.
Each event contains an average of 779 relevant tweets. Over all events, we got an average
Kappa of 0.6, which is a substantially acceptable value. EveTAR was used to evalu-
ate three state-of-the-art event detection algorithms. The best performing algorithms
achieved 0.60 in F1 measure and 0.80 in both precision and recall. We plan to make
our test collection available for research, including events description, manually-crafted
queries to extract potentially-relevant tweets, and all judgments per tweet. EveTAR is
the rst Arabic test collection built from scratch for the task of event detection. Addi-
tionally, we show in our experiments that it supports other tasks like ad-hoc search
Towards NLP-based Semi-automatic Preparation of Content for Language Learning using LingoSnacks m-Learning Platform
Vocabulary growth is an important element for language learning but it requires repeated and varied exposure to the new words and their usage in different context. However preparing suitable learning content for effective language learning remains a challenging and time-consuming task. This paper reports the experience of designing and developing a m-Learning platform (named LingoSnacks) for semi-automatic preparation of content for language learning using Natural Language Processing (NLP) services. LingoSnacks Authoring Tools provide an environment of assisted authoring of learning content and delivering it to the learner in game-like interactive learning activities. Empirical testing results from teachers who used LingoSnacks indicate that the participants were able to ease their lessons preparation tasks. Also the resulting learning packages helped learners in vocabulary acquisition as the number of new vocabulary that they can recognize, recall and retain was significantly higher that participants who just used conventional lessons in a classroom
PROVOKE : Toxicity trigger detection in conversations from the top 100 subreddits
Promoting healthy discourse on community-based online platforms like Reddit can be challenging, especially when conversations show ominous signs of toxicity. Therefore, in this study, we find the turning points (i.e., toxicity triggers) making conversations toxic. Before finding toxicity triggers, we built and evaluated various machine learning models to detect toxicity from Reddit comments.
Subsequently, we used our best-performing model, a fine-tuned Bidirectional Encoder Representations from Transformers (BERT) model that achieved an area under the receiver operating characteristic curve (AUC) score of 0.983 to detect toxicity. Next, we constructed conversation threads and used the toxicity prediction results to build a training set for detecting toxicity triggers. This procedure entailed using our large-scale dataset to refine toxicity triggers' definition and build a trigger detection dataset using 991,806 conversation threads from the top 100 communities on Reddit. Then, we extracted a set of sentiment shift, topical shift, and context-based features from the trigger detection dataset, using them to build a dual embedding biLSTM neural network that achieved an AUC score of 0.789. Our trigger detection dataset analysis showed that specific triggering keywords are common across all communities, like ‘racist’ and ‘women’. In contrast, other triggering keywords are specific to certain communities, like ‘overwatch’ in r/Games. Implications are that toxicity trigger detection algorithms can leverage generic approaches but must also tailor detections to specific communities.© 2022 Wuhan University. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/)fi=vertaisarvioitu|en=peerReviewed
The illusion of data validity : Why numbers about people are likely wrong
This reflection article addresses a difficulty faced by scholars and practitioners working with numbers about people, which is that those who study people want numerical data about these people. Unfortunately, time and time again, this numerical data about people is wrong. Addressing the potential causes of this wrongness, we present examples of analyzing people numbers, i.e., numbers derived from digital data by or about people, and discuss the comforting illusion of data validity. We first lay a foundation by highlighting potential inaccuracies in collecting people data, such as selection bias. Then, we discuss inaccuracies in analyzing people data, such as the flaw of averages, followed by a discussion of errors that are made when trying to make sense of people data through techniques such as posterior labeling. Finally, we discuss a root cause of people data often being wrong – the conceptual conundrum of thinking the numbers are counts when they are actually measures. Practical solutions to address this illusion of data validity are proposed. The implications for theories derived from people data are also highlighted, namely that these people theories are generally wrong as they are often derived from people numbers that are wrong.© 2022 Wuhan University. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).fi=vertaisarvioitu|en=peerReviewed
How Do Users Perceive Deepfake Personas? Investigating the Deepfake User Perception and Its Implications for Human-Computer Interaction
Although deepfakes have a negative connotation in human-computer interaction (HCI) due to their risks, they also involve many opportunities, such as communicating user needs in the form of a “living, talking” deepfake persona. To scope and better understand these opportunities, we present a qualitative analysis of 46 participants’ think-aloud transcripts based on interacting with deepfake personas and human personas, representing a potentially beneficial application of deepfakes for HCI. Our qualitative analysis of 92 think-aloud records indicates five central user deepfake themes, including (1) Realism, (2) User Needs, (3) Distracting Properties, (4) Added Value, and (5) Rapport. The results indicate various challenges in deepfake user perception that technology developers need to address before the potential of deepfake applications can be realized for HCI.© 2023 Copyright held by the owner/author(s). ACM ISBN 979-8-4007-0806-0/23/09. https://doi.org/10.1145/3605390.3605397. This work is licensed under a Creative Commons Attribution International 4.0 License.fi=vertaisarvioitu|en=peerReviewed
Mapping online hate: A scientometric analysis on research trends and hotspots in research on online hate
Internet and social media participation open doors to a plethora of positive opportunities for the general public. However, in addition to these positive aspects, digital technology also provides an effective medium for spreading hateful content in the form of cyberbullying, bigotry, hateful ideologies, and harassment of individuals and groups. This research aims to investigate the growing body of online hate research (OHR) by mapping general research indices, prevalent themes of research, research hotspots, and influential stakeholders such as organizations and contributing regions. For this, we use scientometric techniques and collect research papers from the Web of Science core database published through March 2019. We apply a predefined search strategy to retrieve peer-reviewed OHR and analyze the data using CiteSpace software by identifying influential papers, themes of research, and collaborating institutions. Our results show that higher-income countries contribute most to OHR, with Western countries accounting for most of the publications, funded by North American and European funding agencies. We also observed increased research activity post-2005, starting from more than 50 publications to more than 550 in 2018. This applies to a number of publications as well as citations. The hotbeds of OHR focus on cyberbullying, social media platforms, co-morbid mental disorders, and profiling of aggressors and victims. Moreover, we identified four main clusters of OHR: (1) Cyberbullying, (2) Sexual solicitation and intimate partner violence, (3) Deep learning and automation, and (4) Extremist and online hate groups, which highlight the cross-disciplinary and multifaceted nature of OHR as a field of research. The research has implications for researchers and policymakers engaged in OHR and its associated problems for individuals and society
Mapping online hate: A scientometric analysis on research trends and hotspots in research on online hate
Internet and social media participation open doors to a plethora of
positive opportunities for the general public. However, in addition to
these positive aspects, digital technology also provides an effective
medium for spreading hateful content in the form of cyberbullying,
bigotry, hateful ideologies, and harassment of individuals and groups.
This research aims to investigate the growing body of online hate
research (OHR) by mapping general research indices, prevalent themes of
research, research hotspots, and influential stakeholders such as
organizations and contributing regions. For this, we use scientometric
techniques and collect research papers from the Web of Science core
database published through March 2019. We apply a predefined search
strategy to retrieve peer-reviewed OHR and analyze the data using
CiteSpace software by identifying influential papers, themes of
research, and collaborating institutions. Our results show that
higher-income countries contribute most to OHR, with Western countries
accounting for most of the publications, funded by North American and
European funding agencies. We also observed increased research activity
post-2005, starting from more than 50 publications to more than 550 in
2018. This applies to a number of publications as well as citations. The
hotbeds of OHR focus on cyberbullying, social media platforms, co-morbid mental disorders, and profiling of aggressors and victims. Moreover, we identified four main clusters of OHR: (1) Cyberbullying, (2) Sexual solicitation and intimate partner violence, (3) Deep learning and automation, and (4) Extremist and online hate groups,
which highlight the cross-disciplinary and multifaceted nature of OHR
as a field of research. The research has implications for researchers
and policymakers engaged in OHR and its associated problems for
individuals and society.</p
Developing an online hate classifier for multiple social media platforms
The proliferation of social media enables people to express their
opinions widely online. However, at the same time, this has resulted in
the emergence of conflict and hate, making online environments
uninviting for users. Although researchers have found that hate is a
problem across multiple platforms, there is a lack of models for online
hate detection using multi-platform data. To address this research gap,
we collect a total of 197,566 comments from four platforms: YouTube,
Reddit, Wikipedia, and Twitter, with 80% of the comments labeled as
non-hateful and the remaining 20% labeled as hateful. We then experiment
with several classification algorithms (Logistic Regression, Naïve
Bayes, Support Vector Machines, XGBoost, and Neural Networks) and
feature representations (Bag-of-Words, TF-IDF, Word2Vec, BERT, and their
combination). While all the models significantly outperform the
keyword-based baseline classifier, XGBoost using all features performs
the best (F1 = 0.92). Feature importance analysis indicates that BERT
features are the most impactful for the predictions. Findings support
the generalizability of the best model, as the platform-specific results
from Twitter and Wikipedia are comparable to their respective source
papers. We make our code publicly available for application in real
software systems as well as for further development by online hate
researchers.</p
EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets
This article introduces a new language-independent approach for creating a
large-scale high-quality test collection of tweets that supports multiple
information retrieval (IR) tasks without running a shared-task campaign. The
adopted approach (demonstrated over Arabic tweets) designs the collection
around significant (i.e., popular) events, which enables the development of
topics that represent frequent information needs of Twitter users for which
rich content exists. That inherently facilitates the support of multiple tasks
that generally revolve around events, namely event detection, ad-hoc search,
timeline generation, and real-time summarization. The key highlights of the
approach include diversifying the judgment pool via interactive search and
multiple manually-crafted queries per topic, collecting high-quality
annotations via crowd-workers for relevancy and in-house annotators for
novelty, filtering out low-agreement topics and inaccessible tweets, and
providing multiple subsets of the collection for better availability. Applying
our methodology on Arabic tweets resulted in EveTAR , the first
freely-available tweet test collection for multiple IR tasks. EveTAR includes a
crawl of 355M Arabic tweets and covers 50 significant events for which about
62K tweets were judged with substantial average inter-annotator agreement
(Kappa value of 0.71). We demonstrate the usability of EveTAR by evaluating
existing algorithms in the respective tasks. Results indicate that the new
collection can support reliable ranking of IR systems that is comparable to
similar TREC collections, while providing strong baseline results for future
studies over Arabic tweets
Game-based micro-learning approach for language vocabulary acquisition using LingoSnacks
Acquisition of new vocabulary is an important element for language learning but it requires repeated and varied exposure to the new words and their usage. This paper reports the experience of designing and developing a game-based micro-learning platform (named LingoSnacks) for interactive learning of Arabic vocabulary. The LingoSnacks learning platform provides an environment of authoring learning content and delivering it to the learner in game-like interactive learning activities. Usability evaluation was conducted to refine the system and improve its usability. Empirical testing results for students who used LingoSnacks indicate that the participants were able to increase their rate of vocabulary acquisition as the number of new vocabulary that they can recognize, recall and retain was significantly higher that participants who just used conventional lessons in a classroom